Distributed Training

نویسندگان

Yujun Lin

Song Han

Yu Wang

William J. Dally

چکیده

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections. In this paper, we find 99.9% of the gradient exchange in distributed SGD are redundant, and propose Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth. To preserve accuracy during this compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and warm-up training. We have applied Deep Gradient Compression to image classification, speech recognition, and language modeling with multiple datasets including Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. On these scenarios, Deep Gradient Compression achieves a gradient compression ratio from 270× to 600× without losing accuracy, cutting the gradient size of ResNet-50 from 97MB to 0.35MB, and for DeepSpeech from 488MB to 0.74MB. Deep gradient compression enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributed training on mobile.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Nonlinear Robust Control for Power Flow in Islanded Microgrids

In this paper, a robust local controller has been designed to balance the power for distributed energy resources (DERs) in an islanded microgrid. Three different DER types are considered in this study; photovoltaic systems, battery energy storage systems, and synchronous generators. Since DER dynamics are nonlinear and uncertain, which may destabilize the power system or decrease the performanc...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Effect of Distributed Energy Resources in Energy Hubs on Load and Loss Factors of Energy Distribution Networks

In this paper, an attempt has been made to introduce a new control strategy including Plug-in Hybrid Electric Vehicle (PHEV) and Diesel engine generator to control the voltage and frequency of autonomous microgrids. The proposed control strategy has multiple advantages over the recent control methods in microgrids. The proposed method applies the primary and secondary frequency control strategy...

متن کامل

Modeling and Distributed Simulation Techniques for Synthetic Training Environments

Synthetic Training Environments (STE) can be used to teach people to function within complex systems without the real-world limitations of safety, cost, training areas, or personnel. A cost-effective technique for creating these environments is to integrate a distributed system of computer simulations, virtual environments, and live participants. This chapter characterizes the distributed simul...

متن کامل

Symposium on Distributed Simulation for Military Training of Teams/groups the Engineering of a Training Network

This presentation describes the architecture and engineering development of the Multi-Service Distributed Training Testbed (MDT2). It summarizes the basic principles underlying Distributed Interactive Simulation (DIS) and the major components ofMDT2. It then discusses the problem of simulator interoperability and describes some ofthe interoperability problems encountered in developing MDT2. MDT...

متن کامل

Evaluating a range of learning schedules: hybrid training schedules may be as good as or better than distributed practice for some tasks.

UNLABELLED We investigated theoretically and empirically a range of training schedules on tasks with three knowledge types: declarative, procedural, and perceptual-motor. We predicted performance for 6435 potential eight-block training schedules with ACT-R's declarative memory equations. Hybrid training schedules (schedules consisting of distributed and massed practice) were predicted to produc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Distributed Training

نویسندگان

چکیده

منابع مشابه

Distributed Nonlinear Robust Control for Power Flow in Islanded Microgrids

A New Framework for Distributed Multivariate Feature Selection

Effect of Distributed Energy Resources in Energy Hubs on Load and Loss Factors of Energy Distribution Networks

Modeling and Distributed Simulation Techniques for Synthetic Training Environments

Symposium on Distributed Simulation for Military Training of Teams/groups the Engineering of a Training Network

Evaluating a range of learning schedules: hybrid training schedules may be as good as or better than distributed practice for some tasks.

عنوان ژورنال:

اشتراک گذاری